NNSVS vs. Sinsy
This notebooks show audio samples for comparisions of NNSVS and Sinsy.
Models
sinsy_f00001j: Sinsy’s HMM-based SVS system
sinsy_f00001j_dnn_beta4: Sinsy’s DNN-based SVS system.
nnsvs_yoko: NNSVS-based system trained on the publicly available version of nit-song070 database. Specifically, we used 29 songs (out of 31) for training. Note that pre-trained models based on kiritan_singing database (49 songs for trainnig) were used to initialize model parameters. Therefore, the system in fact used 49 + 29 songs in total for training.
Notes
Trainig data: Accorindg to the latest sinsy’s paper, the authors seems to use 60 songs (out of 70) for training. Since the publically available version of the nit-song070 dataset only contains a subset of the full dataset, we are unable to train NNSVS models with the same training data condition.
Date: Sinsy samples were generated at 2022/03/27 using https://www.sinsy.jp/.
Preparation
[1]:
%%capture
try:
import nnsvs
except ImportError:
! pip install git+https://github.com/r9y9/nnsvs
[2]:
%pylab inline
%load_ext autoreload
%autoreload
import IPython
from IPython.display import Audio
from scipy.io import wavfile
import pysinsy
from nnmnkwii.io import hts
from urllib.request import urlretrieve
import tempfile
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib
[3]:
from nnsvs.pretrained import create_svs_engine
import nnsvs
[4]:
def svs_display(model, xml_file):
engine = create_svs_engine(model)
contexts = pysinsy.extract_fullcontext(xml_file)
labels = hts.HTSLabelFile.create_from_contexts(contexts)
wav, sr = engine.svs(labels)
IPython.display.display(Audio(wav, rate=sr))
def wav_display(url):
with tempfile.NamedTemporaryFile(suffix=".wav") as f:
urlretrieve(url, f.name)
sr, wav = wavfile.read(f.name)
IPython.display.display(Audio(wav, rate=sr))
Sample 1: げんこつ山のタヌキさん
[5]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/qq6w7bbcc5ikcdf/sinsy_song070_f00001j_063.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("song070_f00001_063"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/4epe08wqebyuh4g/sinsy_song070_f00001j_dnn_beta4_063.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4
Sample 2: Get Over
[6]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/kam9kju97umi6li/sinsy_f00001j_get_over.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("get_over"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/7st0acvguvbdoaj/sinsy_f00001j_dnn_beta4_get_over.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4
Sample 3: 雪
[7]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/ho5xgkil8r3f3ed/sinsy_yuki_f00001j.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("yuki"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/jo2ool0nytzxln2/sinsy_yuki_f00001j_dnn_beta4.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4